AITopics | hop length

Collaborating Authors

hop length

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency

Lay, Bunlong, Makarov, Rostislav, Gerkmann, Timo

arXiv.org Artificial IntelligenceSep-15-2025

Diffusion models are a class of generative models that have been recently used for speech enhancement with remarkable success but are computationally expensive at inference time. Therefore, these models are impractical for processing streaming data in real-time. In this work, we adapt a sliding window diffusion framework to the speech enhancement task. Our approach progressively corrupts speech signals through time, assigning more noise to frames close to the present in a buffer. This approach outputs denoised frames with a delay proportional to the chosen buffer size, enabling a trade-off between performance and latency. Empirical results demonstrate that our method outperforms standard diffusion models and runs efficiently on a GPU, achieving an input-output latency in the order of 0.3 to 1 seconds. This marks the first practical diffusion-based solution for online speech enhancement.

artificial intelligence, machine learning, reverse step, (13 more...)

arXiv.org Artificial Intelligence

2506.02908

Country: Europe (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Skeleton-Guided Learning for Shortest Path Search

Liu, Tiantian, Li, Xiao, Li, Huan, Lu, Hua, Jensen, Christian S., Xu, Jianliang

arXiv.org Artificial IntelligenceAug-5-2025

Shortest path search is a core operation in graph-based applications, yet existing methods face important limitations. Classical algorithms such as Dijkstra's and A* become inefficient as graphs grow more complex, while index-based techniques often require substantial preprocessing and storage. Recent learning-based approaches typically focus on spatial graphs and rely on context-specific features like geographic coordinates, limiting their general applicability. We propose a versatile learning-based framework for shortest path search on generic graphs, without requiring domain-specific features. At the core of our approach is the construction of a skeleton graph that captures multi-level distance and hop information in a compact form. A Skeleton Graph Neural Network (SGNN) operates on this structure to learn node embeddings and predict distances and hop lengths between node pairs. These predictions support LSearch, a guided search algorithm that uses model-driven pruning to reduce the search space while preserving accuracy. To handle larger graphs, we introduce a hierarchical training strategy that partitions the graph into subgraphs with individually trained SGNNs. This structure enables HLSearch, an extension of our method for efficient path search across graph partitions. Experiments on five diverse real-world graphs demonstrate that our framework achieves strong performance across graph types, offering a flexible and effective solution for learning-based shortest path search.

data mining, machine learning, vertex, (19 more...)

arXiv.org Artificial Intelligence

2508.0227

Country:

Europe > Denmark (0.46)
North America > United States (0.46)
Asia > China (0.28)

Genre:

Research Report (0.82)
Overview (0.67)

Industry: Transportation > Infrastructure & Services (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform

Leiber, Maxime, Marnissi, Yosra, Barrau, Axel, Meignen, Sylvain, Massoulié, Laurent

arXiv.org Artificial IntelligenceJun-27-2025

The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the limitations of traditional STFT parameter tuning methods, which often rely on computationally intensive discrete searches. It enables fine-tuning of the time-frequency representation (TFR) based on any desired criterion. Moreover, our approach integrates seamlessly with neural networks, allowing joint optimization of the STFT parameters and network weights. The efficacy of the proposed differentiable STFT in enhancing TFRs and improving performance in downstream tasks is demonstrated through experiments on both simulated and real-world data.

artificial intelligence, machine learning, window length, (18 more...)

arXiv.org Artificial Intelligence

2506.2144

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

Add feedback

Textless NLP -- Zero Resource Challenge with Low Resource Compute

Ramadass, Krithiga, Singh, Abrit Pal, J, Srihari, Kalyani, Sheetal

arXiv.org Artificial IntelligenceSep-24-2024

Coding (VQ-CPC) [8] as the encoder in our speech processing The availability of text data for low-resource languages has pipeline. The input audio files are preprocessed and always been a challenge and transfer learning from multilingual extracted as log-Mel spectrograms. The initial processing models has its own limitations. End-to-End spoken systems involves convolution and normalization layers to extract highlevel without involving text have received significant attention features. These features are then passed through an in the recent years. The Zero-Resource challenge (ZRC) [1] auto-regressive network, which predicts future representations has enabled addressing the low-resource language representation of the input based on past information. One of the key problem and has been a significant driver in this area. In characteristics of VQ-CPC is its use of vector quantization as the acoustic unit discovery task for ZRC, high-dimensional a bottleneck to discretize the continuous embeddings extracted input speech data is mapped to its latent representation to by the autoregressive network into a finite set of discrete codes.

architecture, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.19015

Country: Asia > India > Tamil Nadu > Chennai (0.05)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Optimising MFCC parameters for the automatic detection of respiratory diseases

Yan, Yuyang, Simons, Sami O., van Bemmel, Loes, Reinders, Lauren, Franssen, Frits M. E., Urovi, Visara

arXiv.org Artificial IntelligenceAug-14-2024

Voice signals originating from the respiratory tract are utilized as valuable acoustic biomarkers for the diagnosis and assessment of respiratory diseases. Among the employed acoustic features, Mel Frequency Cepstral Coefficients (MFCC) is widely used for automatic analysis, with MFCC extraction commonly relying on default parameters. However, no comprehensive study has systematically investigated the impact of MFCC extraction parameters on respiratory disease diagnosis. In this study, we address this gap by examining the effects of key parameters, namely the number of coefficients, frame length, and hop length between frames, on respiratory condition examination. Our investigation uses four datasets: the Cambridge COVID-19 Sound database, the Coswara dataset, the Saarbrucken Voice Disorders (SVD) database, and a TACTICAS dataset. The Support Vector Machine (SVM) is employed as the classifier, given its widespread adoption and efficacy. Our findings indicate that the accuracy of MFCC decreases as hop length increases, and the optimal number of coefficients is observed to be approximately 30. The performance of MFCC varies with frame length across the datasets: for the COVID-19 datasets (Cambridge COVID-19 Sound database and Coswara dataset), performance declines with longer frame lengths, while for the SVD dataset, performance improves with increasing frame length (from 50 ms to 500 ms). Furthermore, we investigate the optimized combination of these parameters and observe substantial enhancements in accuracy. Compared to the worst combination, the SVM model achieves an accuracy of 81.1%, 80.6%, and 71.7%, with improvements of 19.6%, 16.10%, and 14.90% for the Cambridge COVID-19 Sound database, the Coswara dataset, and the SVD dataset respectively.

coefficient, dataset, frame length, (17 more...)

arXiv.org Artificial Intelligence

2408.07522

Country:

Europe > Germany > Saarland > Saarbrücken (0.25)
Europe > Netherlands > Limburg > Maastricht (0.05)
Europe > Spain > Galicia > Madrid (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)

Add feedback

Differentiable short-time Fourier transform with respect to the hop length

Leiber, Maxime, Marnissi, Yosra, Barrau, Axel, Badaoui, Mohammed El

arXiv.org Artificial IntelligenceJul-26-2023

The short-time Fourier transform (STFT) is a frequently used tool for analyzing non-stationary digital signals in various fields including audio Stafford et al. [1998], medicine Huang et al. [2019], and vibration analysis Leclère et al. [2016]. Spectrograms, which are obtained from the STFT magnitude, are essential for visualizing, understanding, and processing non-stationary signals in time-frequency representation. The STFT parameters, including tapering function, window length, and hop length, are critical and dependent on the application and signal characteristics. The tapering function balances frequency resolution and spectral leakage, with a narrower main lobe providing better frequency resolution at the expense of increased spectral leakage, and a wider main lobe reducing spectral leakage but decreasing frequency resolution. The Hann or Hamming window is a common starting point, but the best choice depends on the application's specific requirements. Actually, most studies on STFT parameters have focused on the choice of the window length, as it determines the time-frequency resolution trade-off. A shorter window length provides better time resolution but poor frequency resolution. Conversely, a longer window length provides better frequency resolution but poor time resolution. To provide more precise control over temporal and frequency resolution based on the local characteristics of the input signal, researchers have proposed using variable-length windows.

data quality, hop length, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2308.02421

Country: Pacific Ocean (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Play It Back: Iterative Attention for Audio Recognition

Stergiou, Alexandros, Damen, Dima

arXiv.org Artificial IntelligenceMar-12-2023

A key function of auditory cognition is the association of characteristic sounds with their corresponding semantics over time. Humans attempting to discriminate between fine-grained audio categories, often replay the same discriminative sounds to increase their prediction confidence. We propose an end-to-end attention-based architecture that through selective repetition attends over the most discriminative sounds across the audio sequence. Our model initially uses the full audio sequence and iteratively refines the temporal segments replayed based on slot attention. At each playback, the selected segments are replayed using a smaller hop length which represents higher resolution features within these segments. We show that our method can consistently achieve state-of-the-art performance across three audio-classification benchmarks: AudioSet, VGG-Sound, and EPIC-KITCHENS-100.

artificial intelligence, machine learning, playback, (19 more...)

arXiv.org Artificial Intelligence

2210.11328

Country:

Europe > United Kingdom > England > Bristol (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback